Introduce --step-timeout option for kprove to ensure timely termination#4930
Open
Stevengre wants to merge 5 commits into
Open
Introduce --step-timeout option for kprove to ensure timely termination#4930Stevengre wants to merge 5 commits into
--step-timeout option for kprove to ensure timely termination#4930Stevengre wants to merge 5 commits into
Conversation
Implements #4924: factor kontrol's `--per-depth-timeout` mechanism out of tool-specific code and into a generic policy in `Prover.advance_proof`. `advance_proof` now runs each step under the prover's `step_timeout` (a per-step wall-clock budget). A step that exceeds its budget is interrupted, the prover is asked to `shrink_step` (do less work per step), and the step is retried (the timed-out node was never committed, so it reappears from `get_steps`). If the prover cannot shrink further, advancement stops. Provers without a `step_timeout` run synchronously as before, so `advance_proof` stays generic and gains no new parameter. - Prover: `step_timeout` attribute (whole seconds, None disables) plus no-op `shrink_step()` / `interrupt()` hooks - APRProver: `step_timeout` ctor arg (floored at 1s); `shrink_step` halves `execute_depth`; `interrupt` aborts the in-flight Kore request - KCFGExplore/CTermSymbolic/KoreClient/JsonRpc*/Transport: `interrupt()` that force-unblocks an in-flight single-socket request and reconnects - Unit tests: shrink-until-progress, stop-at-floor, fast-path, disabled
The step-timeout/shrink policy added to Prover.advance_proof was reachable only programmatically. Wire it through the pyk prove command: add a step_timeout field to ProveOptions, a --step-timeout argument (whole seconds, floored at 1 by APRProver), and thread the value from ProveRpc into APRProver. Omitting the flag leaves step_timeout=None, preserving the prior synchronous behavior.
fa4c4fa to
bf77114
Compare
Member
|
This looks pretty good to me. I guess I'm not an expert on whether this is the best way to handle this issue, but one question I have: when we interrupt a step, it will disconnect from that instance of the haskell backend, but does that also stop the haskell backend from working on that step at the same time? Or does the haskell backend keep working on it, we are just now trying some smaller step size ourselves? |
The prior interrupt() shut down the client socket and reconnected. Empirically (legacy/haskell kore-rpc 0.1.145), that does NOT stop the backend: the request runs in a detached worker thread that the server only aborts on an explicit `cancel` JSON-RPC method, never on a dropped connection. So an interrupted step kept a core pinned at ~100% until it finished on its own, while we reconnected and retried a smaller step. Instead, inject a `cancel` request on the live connection. The server aborts the in-flight request, the awaiting thread receives a "Request cancelled" error, and the connection stays open and reusable (no reconnect). Measured: server CPU drops from ~100% to ~2% immediately after interrupt(). - Transport: replace no-arg interrupt() with send_interrupt(data) (raw out-of-band send on the live connection; default no-op; HTTP stays no-op, one conn per request) - SingleSocketTransport: sendall the cancel payload, no shutdown/reconnect - JsonRpcClient: build the `cancel` request and hand it to the transport
Asserts that KoreClient.interrupt() aborts an in-flight (non-terminating) execute promptly with a "Request cancelled" error rather than letting it run to completion, and that the connection survives the cancel and serves a subsequent request. Covers the interrupt mechanism advance_proof's step-timeout policy depends on.
ehildenb
reviewed
Jun 11, 2026
Comment on lines
+1
to
+22
| from __future__ import annotations | ||
|
|
||
| from threading import Event | ||
| from typing import TYPE_CHECKING | ||
|
|
||
| from pyk.proof.proof import Proof, ProofStatus, Prover | ||
|
|
||
| if TYPE_CHECKING: | ||
| from collections.abc import Mapping | ||
| from pathlib import Path | ||
| from typing import Any | ||
|
|
||
|
|
||
| class _StepInterrupted(Exception): | ||
| """Raised inside `step_proof` when the prover is interrupted, mimicking a backend abort.""" | ||
|
|
||
|
|
||
| class CountingProof(Proof[int, int]): | ||
| """Minimal proof that needs `target` committed steps to pass.""" | ||
|
|
||
| target: int | ||
| committed: int |
Member
There was a problem hiding this comment.
I think this file is likely not needed, it's not really testing anything useful, and may make testing too brittle for this feature in the future. I'd recommend removing it. The integration test added should be enough.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
advance_proofnow runs each step under the prover'sstep_timeout(a per-step wall-clock budget). A step that exceeds its budget is interrupted, the prover is asked toshrink_step(do less work per step), and the step is retried (the timed-out node was never committed, so it reappears fromget_steps). If the prover cannot shrink further, advancement stops. Provers without astep_timeoutrun synchronously as before, soadvance_proofstays generic and gains no new parameter.Close #4924.
PoC PR: runtimeverification/kontrol#1141